mamba2: remove hardcoded 2x expansion factor and invalid d_inner % d_state check#23082
Conversation
|
Hi @limloop, thanks for your contribution! Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:
Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below. |
CISC
left a comment
There was a problem hiding this comment.
Do you have links to models with differing expand?
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
|
@CISC, here's a model with non-standard limloop/whiff-mamba2-50M-v0.1 Config values:
With current With my changes (this PR), it loads and generates coherent text. |
|
Thanks, rebase and adjust accordingly to refactor please (moved to |
|
@CISC updated, ready for review |
CISC
left a comment
There was a problem hiding this comment.
Sorry for the long delay, I had hoped @compilade would take a look.
…state check (ggml-org#23082) * mamba2: remove hardcoded 2x expansion factor, support any expand value * mamba2: remove invalid d_inner %% d_state check (unrelated parameters) * Update convert_hf_to_gguf.py: make expand optional with default 2 Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * mamba2: apply expand fix to refactored conversion/mamba.py * also check for mamba_expand --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Sigbjørn Skjæret <1629204+CISC@users.noreply.github.com>
This PR removes two unnecessary restrictions in Mamba2 that prevent loading models with custom architectures.
Changes:
Remove hardcoded 2x expansion factor (
GGML_ASSERT(2 * n_embd == d_inner))expand=2expandis not stored in GGUF, onlyd_innerisRemove invalid
d_inner % d_statecheckd_innerandd_stateare unrelated parametersTesting:
expand=2) — loads and runs correctlyexpand=1,d_inner=512,d_model=512) — loads and generates coherent outputBackward compatibility: Models with
expand=2work identically to before.Related discussion: #21346